53 research outputs found

    Learning spectro-temporal representations of complex sounds with parameterized neural networks

    Get PDF
    Deep Learning models have become potential candidates for auditory neuroscience research, thanks to their recent successes on a variety of auditory tasks. Yet, these models often lack interpretability to fully understand the exact computations that have been performed. Here, we proposed a parametrized neural network layer, that computes specific spectro-temporal modulations based on Gabor kernels (Learnable STRFs) and that is fully interpretable. We evaluated predictive capabilities of this layer on Speech Activity Detection, Speaker Verification, Urban Sound Classification and Zebra Finch Call Type Classification. We found out that models based on Learnable STRFs are on par for all tasks with different toplines, and obtain the best performance for Speech Activity Detection. As this layer is fully interpretable, we used quantitative measures to describe the distribution of the learned spectro-temporal modulations. The filters adapted to each task and focused mostly on low temporal and spectral modulations. The analyses show that the filters learned on human speech have similar spectro-temporal parameters as the ones measured directly in the human auditory cortex. Finally, we observed that the tasks organized in a meaningful way: the human vocalizations tasks closer to each other and bird vocalizations far away from human vocalizations and urban sounds tasks

    Modelling the Quantum Capacitance of Single-layer and Bilayer Graphene

    Get PDF
    In this paper, we report the modelling of quantum capacitance in both single-layer and bilayer graphene devices to investigate the temperature dependence. The model includes the existence of electron and hole puddles due to local fluctuations of the potential, which is taken into account with the possibility of finite lifetimes of electronic states to calculate the quantum capacitance using the Gaussian distribution. The results indicate that the simulations are in agreement with the experimental measurements, which proves the accuracy of the proposed model. On the other hand, temperature dependence around the charge neutrality point has been reported for both single and bilayer graphene

    Sampling strategies in Siamese Networks for unsupervised speech representation learning

    Get PDF
    Recent studies have investigated siamese network architectures for learning invariant speech representations using same-different side information at the word level. Here we investigate systematically an often ignored component of siamese networks: the sampling procedure (how pairs of same vs. different tokens are selected). We show that sampling strategies taking into account Zipf's Law, the distribution of speakers and the proportions of same and different pairs of words significantly impact the performance of the network. In particular, we show that word frequency compression improves learning across a large range of variations in number of training pairs. This effect does not apply to the same extent to the fully unsupervised setting, where the pairs of same-different words are obtained by spoken term discovery. We apply these results to pairs of words discovered using an unsupervised algorithm and show an improvement on state-of-the-art in unsupervised representation learning using siamese networks.Comment: Conference paper at Interspeech 201

    Impact of tumor-infiltrating lymphocytes on pathological complete response after neoadjuvant chemotherapy in patients with early triple-negative breast cancer

    Get PDF
    Description of the subject: Triple-negative breast cancer (TNBC), a breast cancer subtype, is characterized by the lack of both estrogen and progesterone hormonal receptors expression and by the absence of human epidermal growth factor receptor 2 overexpression. Patients with a pathological complete response (pCR) have better disease-free and overall survival compared to those with residual disease. The high level of tumor-infiltrating lymphocytes (TILs) is associated with a higher response to neoadjuvant chemotherapy (NAC) and better prognosis. Objective: Evaluation of TILs and their predictive impact in early TNBC in an Algerian population. Methods: We assessed TILs and correlated them with the pCR rate in 94 early TNBC patients treated from 2015 to 2017 who underwent breast microbiopsy, NAC, and then surgery. Results: Among 94 early TNBC patients, 53 (56.4%) achieved pCR and 41 (43.6%) had a residual disease. While some clinicopathological factors did not affect pCR, stromal TILs showed significant correlation with pCR (P < 0.0001). The presence of CD3+, CD4+, CD8+ and CD20+ TILs was also significantly correlated with pCR (P < 0.0001, P = 0.001, P = 0.0003 and P = 0.0001, respectively). Conclusion: Our data showed that TILs were significantly associated with pCR, suggesting that TILs are a predictive biomarker for pCR in early TNBC patients treated by NAC in our cohort

    XNMT: The eXtensible Neural Machine Translation Toolkit

    Full text link
    This paper describes XNMT, the eXtensible Neural Machine Translation toolkit. XNMT distin- guishes itself from other open-source NMT toolkits by its focus on modular code design, with the purpose of enabling fast iteration in research and replicable, reliable results. In this paper we describe the design of XNMT and its experiment configuration system, and demonstrate its utility on the tasks of machine translation, speech recognition, and multi-tasked machine translation/parsing. XNMT is available open-source at https://github.com/neulab/xnmtComment: To be presented at AMTA 2018 Open Source Software Showcas

    Learning Word Embeddings: Unsupervised Methods for Fixed-size Representations of Variable-length Speech Segments

    Get PDF
    International audienceFixed-length embeddings of words are very useful for a variety of tasks in speech and language processing. Here we systematically explore two methods of computing fixed-length embeddings for variable-length sequences. We evaluate their susceptibility to phonetic and speaker-specific variability on English, a high resource language and Xitsonga, a low resource language, using two evaluation metrics: ABX word discrimination and ROC-AUC on same-different phoneme n-grams. We show that a simple downsampling method supplemented with length information can outperform the variable-length input feature representation on both evaluations. Recurrent autoencoders, trained without supervision, can yield even better results at the expense of increased computational complexity

    Enregistrements de longue durée: Opportunités et défis

    Get PDF
    International audienceTechnological developments have allowed the development of lightweight, wearable recorders that collect audio (including speech) lasting up to a whole day. We provide a general description of the technique and lay out the advantages and drawbacks when using this methodology. Field linguists may gain a uniquely naturalistic viewpoint of language use as people go about their everyday activities. However, due to their duration, noisiness, and likelihood of containing sensitive information, long-form recordings remain difficult to annotate manually. Open-source tools improve reproducibility and ease-of-use for researchers, to which end speech technologists can contribute. Additionally, new approaches to human and automated annotation make the study of speech in longform recordings increasingly feasible and promising
    • …
    corecore